[SPARK-54004][SQL] Fix uncaching table by name without cascading#52712
[SPARK-54004][SQL] Fix uncaching table by name without cascading#52712aokolnychyi wants to merge 1 commit intoapache:masterfrom
Conversation
|
This was discovered and discussed in another PR. |
|
Ack. Thank you, @aokolnychyi . |
| } | ||
|
|
||
| plan match { | ||
| EliminateSubqueryAliases(plan) match { |
There was a problem hiding this comment.
I wonder why we keep SubqueryAlias when putting the logical plan into cache data.
There was a problem hiding this comment.
An alternative solution could be to call EliminateSubqueryAliases BEFORE putting the plan into the cache. This, however, will remove ALL subquery aliases... I was not sure about consequences, but I would be open to consider this option if everyone thinks it is safe.
Thoughts, @viirya @dongjoon-hyun @szehon-ho @cloud-fan?
There was a problem hiding this comment.
yea we don't need SubqueryAlias in the cache keys, as during lookup, we call LogicalPlan#sameResult which strips the SubqueryAlias.
However, the cache key logical plans are exposed to custom normalization rules (See SparkSessionExtensions#injectPlanNormalizationRule), so seems safer to keep it.
There was a problem hiding this comment.
Let's keep it then, it is a fragile part of code.
|
Although it looks irrelevant, please re-trigger the failed PySpark CI, @aokolnychyi . |
|
Retrying PySpark CI... |
|
Merged to master for Apache Spark 4.1.0-preview3. |
|
Thank you, @dongjoon-hyun @szehon-ho @viirya @gengliangwang @cloud-fan! |
### What changes were proposed in this pull request? This PR fixes uncaching table by name without cascading. ### Why are the changes needed? These changes are needed to invalidate data cache correctly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR comes with a test that previously failed. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#52712 from aokolnychyi/spark-54004. Authored-by: Anton Okolnychyi <aokolnychyi@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR fixes uncaching table by name without cascading.
Why are the changes needed?
These changes are needed to invalidate data cache correctly.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This PR comes with a test that previously failed.
Was this patch authored or co-authored using generative AI tooling?
No.